Overview
The EDL pipeline runs automatically every weekday at 4:00 PM IST via GitHub Actions. The workflow fetches fresh market data, processes it through all pipeline stages, and commits the compressed output back to the repository.Workflow Configuration
The workflow is defined in.github/workflows/daily_refresh.yml:
Schedule Configuration
Cron Schedule
30 10: 10:30 AM UTC* * 1-5: Every day, every month, Monday through Friday
- UTC: 10:30 AM
- IST: 4:00 PM (UTC + 5:30)
- NSE closes at 3:30 PM IST
- 30-minute buffer ensures all settlement data is available
- Corporate actions and announcements are typically posted by 4 PM
The workflow only runs on weekdays (Monday-Friday) since Indian stock markets are closed on weekends.
OHLCV Caching Strategy
The workflow uses GitHub Actions cache to persist OHLCV data between runs, reducing execution time from ~30 minutes to ~5 minutes.Cache Configuration
How It Works
Cache Key Generation
The cache key includes:
- Version:
ohlcv-v1(bump to invalidate all caches) - OS:
${{ runner.os }}(ubuntu-latest) - ISIN Map Hash:
${{ hashFiles('master_isin_map.json') }}
ohlcv-v1-Linux-8a3f2c9d...Cache Restoration
Before running the pipeline:
- Exact match: Restores OHLCV data for current stock universe
- Fallback: Uses partial match
ohlcv-v1-Linux-if ISIN map changed
Incremental Update
fetch_all_ohlcv.py detects existing files and only fetches:- New trading days for existing stocks
- Full history for newly listed stocks
Cache Invalidation
The cache is automatically invalidated when:| Event | Reason | Impact |
|---|---|---|
| New stock listed | master_isin_map.json changes | Creates new cache key |
| Stock delisted | master_isin_map.json changes | Creates new cache key |
| Cache version bumped | Manual ohlcv-v1 → ohlcv-v2 | Forces fresh download |
| 7 days of inactivity | GitHub cache eviction policy | Old cache deleted |
Manual Trigger
You can manually run the workflow outside the scheduled time usingworkflow_dispatch.
Use cases for manual triggers:
- Testing workflow changes
- Refreshing data after market hours outside schedule
- Recovering from a failed automated run
- Forcing a fresh data pull after API changes
Files Committed
After successful pipeline execution, the workflow commits:| File | Size | Records | Description |
|---|---|---|---|
all_stocks_fundamental_analysis.json.gz | ~2 MB | 2,775 | Complete stock analysis (86 fields/stock) |
sector_analytics.json.gz | ~8 KB | 12 | Sector-wise aggregated metrics |
market_breadth.json.gz | ~10 KB | 1 | Market-wide breadth indicators |
all_indices_list.json | ~85 KB | 194 | All market indices (uncompressed) |
Commit Behavior
[skip ci]: Prevents triggering another workflow run|| echo "No changes": Prevents failure if data is identical to previous run- Rebase strategy: Uses
git pull --rebase --autostashto avoid merge commits
Workflow Execution Time
| Scenario | Duration | Cache Status |
|---|---|---|
| First run | ~35 minutes | No cache (full OHLCV download) |
| Daily refresh | ~5-7 minutes | Cache hit (incremental update) |
| New stock added | ~6-8 minutes | Partial cache hit |
| Cache invalidated | ~35 minutes | No cache (full rebuild) |
Monitoring and Debugging
Viewing Logs
- Go to Actions tab
- Click the workflow run
- Expand each step to see detailed logs
Common Issues
Workflow fails with 'No space left on device'
Workflow fails with 'No space left on device'
Cause: OHLCV data + intermediate files exceed runner disk space.Solution:
Cache not restoring
Cache not restoring
Cause: Cache key changed or cache expired.Solution: First run after key change will be slower. Subsequent runs will use new cache.
Commit fails with 'nothing to commit'
Commit fails with 'nothing to commit'
Cause: Data identical to previous run (rare, usually weekends).Solution: This is expected behavior. The
|| echo prevents failure.Pipeline times out after 6 hours
Pipeline times out after 6 hours
Cause: Network issues or API rate limiting.Solution: Manually re-run the workflow. Check API endpoints for outages.
GitHub Actions Limits
Cost optimization:- OHLCV caching reduces runtime by 85% (35 min → 5 min)
CLEANUP_INTERMEDIATE = Truereduces storage usage[skip ci]in commit message prevents recursive triggers
Security Considerations
Repository Permissions
No Secrets Required
All data sources used by the pipeline are publicly accessible APIs:- Dhan ScanX endpoints
- NSE Archives
- No authentication tokens needed
If you fork this repository, ensure Actions are enabled in repository settings and the workflow file is present in
.github/workflows/.